Back

Philosophical Text Analysis with Word Embeddings

Training and comparing Word2Vec word embeddings to analyze semantic differences among major philosophical schools.

This project was developed for the Data Semantics course in the Master’s Degree in Data Science.

“The meaning of a word is its use in the language” (Wittgenstein, 1953)

This quote from philosopher Ludwig Wittgenstein perfectly encapsulates the core idea behind word embedding algorithms, serving as a symbolic bridge between philosophy and data semantics.
Philosophy is a profoundly human discipline: every individual in history has wrestled with the existential questions it raises. As a result, philosophical texts are vast, numerous, and fragmented across different schools of thought, each offering distinct ways of understanding knowledge and existence. Keeping track of these perspectives is challenging, given the diversity of thought and expression.
Yet, all these different answers begin with the same questions, the questions of humanity. So why not train word embeddings to explore how different philosophical currents respond to the same fundamental issues?

An initial dataset for this task was obtained from The Philosophy Data Project. This project shares similar goals and makes its data available on Kaggle. However, an initial analysis of the corpus revealed that its size was insufficient for training robust word embedding models. To address this, the dataset was extensively enriched with the complete texts of philosophers' works available on Project Gutenberg. The resulting division into philosophical schools is as follows:

  1. Nihilism: Nietzsche, Kierkegaard
  2. Empiricism: Berkeley, Hume, Locke
  3. Rationalism: Descartes, Leibniz, Malebranche, Spinoza
  4. German Idealism: Fichte, Hegel, Kant
  5. Analytic Philosophy: Kripke, Lewis, Moore, Popper, Quine, Russell, Wittgenstein
  6. Aristotle: Aristotle
  7. Plato: Plato
An eighth "slice" was created using abstracts from Wikipedia. This addition provided both more training data and a "neutral" comparison baseline.

The analysis was carried out using three main algorithms: Word2Vec, CADE, and SWEAT. Additionally, four custom functions were developed to support exploratory analysis across the eight slices.

The results were compelling: the word embeddings successfully captured semantic differences that aligned with philosophical expectations. For example, the Analytic school’s emphasis on logic as a foundation for understanding reality was reflected in the embeddings for both “logic” and “mathematics”. The word “language” was semantically close to “usage” in the Analytic school, to “composition” in Idealism, and to “persuasion” in Aristotle, consistent with each school’s perspective. In Idealism, the similarity between “idea” and “reality” was particularly strong. In Plato’s writings, the embeddings showed high similarity between “idea” and “innate” as well as between “evil” and “ignorance”, supporting key Platonic beliefs. In the Empiricist slice, the concept of the “blank slate” emerged clearly in the proximity between “mind” and “empty”. Polarization analysis revealed that the darker tone of Nihilism was reflected in the embeddings trained on its texts. Finally, contrasting the meaning of “knowledge” across schools revealed its association with “experience” in Empiricism and with “intelligence” in Rationalism.

A concluding section reflects on the philosophical concepts that were not successfully captured by the word embeddings and discusses what methodological changes might yield improved results.

Tags

Data Semantics Word Embeddings Word2Vec cade WEAT NLP